BAYES PROCEDURES FOR ADAPTIVE INFERENCE IN 
NONPARAMETRIC INVERSE PROBLEMS 



By B.T. Knapik, B.T. Szabo, A.W. van der Vaart 

AND J.H. VAN ZANTEN 

VU University Amsterdam, Eindhoven University of Technology, Leiden 
University and University of Amsterdam 

We study empirical and hierarchical Bayes approaches to the 
problem of estimating an infinite-dimensional parameter in mildly 
ill-posed inverse problems. We consider a class of prior distributions 
indexed by a hyperparameter that quantifies regularity. We prove 
that both methods we consider succeed in automatically selecting 
this parameter optimally, resulting in optimal convergence rates for 
truths with Sobolev or analytic "smoothness" , without using knowl- 
edge about this regularity. Both methods are illustrated by simulation 
examples. 

1. Introduction. In recent years, Bayesian approaches have become 
more and more common in dealing with nonparametric statistical inverse 
problems. Such problems arise in many fields of applied science, including 
geophysics, genomics, medical image analysis and astronomy, to mention 
but a few. In nonparametric inverse problems some form of regularization is 
usually needed in order to estimate the (typically functional) parameter of 
interest. One possible explanation of the increasing popularity of Bayesian 
methods is the fact that assigning a prior distribution to an unknown func- 
tional parameter is a natural way of specifying a degree of regularization. 
Probably at least as important is the fact that various computational meth- 
ods exist to carry out the inference in practice, including MCMC methods 
and approximate methods like expectation propagation, Laplace approxima- 
tions and approximate Bayesian computation. A third important aspect that 
appeals to users of Bayes methods is that an implementation of a Bayesian 
procedure typically produces not only an estimate of the unknown quantity 
of interest (usually a posterior mean or mode), but also a large number of 
samples from the whole posterior distribution. These can then be used to 
report a credible set, i.e. a set of parameter values that receives a large fixed 
fraction of the posterior mass, that serves as a quantification of the uncer- 
tainty in the estimate. Some examples of papers using Bayesian methods in 
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nonparametric inverse problems in various applied settings include [14], [23], 
[19], [22], [2]. The paper [26] provides a nice overview and many additional 
references. 

Work on the fundamental properties of Bayes procedures for nonparamet- 
ric inverse problems, like consistency, (optimal) convergence rates, etcetera, 
has only started to appear recently. The few papers in this area include [18], 
[17], [13], [1]. This is in sharp contrast with the work on frequentist method- 
ology, which is quite well developed. See for instance the overviews given by 
Cavalier [7], [8]. 

Our focus in this paper is on the ability of Bayesian methods to achieve 
adaptive, rate-optimal inference in so-called mildly ill-posed nonparamet- 
ric inverse problems (in the terminology of, e.g., [7]). Nonparametric priors 
typically involve one or more tuning parameters, or hyper-parameters, that 
determine the degree of regularization. In practice there is widespread use 
of empirical Bayes and full, hierarchical Bayes methods to automatically 
select the appropriate values of such parameters. These methods are gen- 
erally considered to be preferable to methods that use only a single, fixed 
value of the hyper-parameters. In the inverse problem setting it is known 
from the recent paper [18] that using a fixed prior can indeed be undesir- 
able, since it can lead to convergence rates that are sub-optimal, unless by 
chance the statistician has selected a prior that captures the fine properties 
of the unknown parameter (like its degree of smoothness, if it is a function). 
Theoretical work that supports the preference for empirical or hierarchical 
Bayes methods does not exist at the present time however. It has until now 
been unknown whether these approaches can indeed robustify a procedure 
against prior mismatch. In this paper we answer this question in the affir- 
mative. We show that empirical and hierarchical Bayes methods can lead to 
adaptive, rate-optimal procedures in the context of nonparametric inverse 
problems, provided they are properly constructed. 

We study this problem in the context of the canonical signal-in-white- 
noise model, or, equivalently, the infinite-dimensional normal mean model. 
Using singular value decompositions many nonparametric, linear inverse 
problems can be cast in this form (e.g. [8], [18]). Specifically, we assume 
that we observe a sequence of noisy coefficients Y = (Y\, Yz, . . .) satisfying 



where Z±, Z%, . . . are independent, standard normal random variables, fj, = 
(fi>i, fJ-2, • • •) £ £2 is the infinite-dimensional parameter of interest, and (Kj) 
is a known sequence that may converge to as i — > 00, which complicates 
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the inference. We suppose the problem is mildly ill-posed of order p > 0, in 
the sense that 



for some C > 1. Minimax lower bounds for the rate of convergence of estima- 
tors for [i are well known in this setting. For instance, the lower bound over 



There are several regularization methods which attain these rates, includ- 
ing classical Tikhonov regularization and Bayes procedures with Gaussian 
priors. 

Many of the older existing methods for nonparametric inverse problems 
are not adaptive, in the sense that they rely on knowledge of the regularity 
(e.g. in Sobolev sense) of the unknown parameter of interest to select the 
appropriate regularization. This also holds for the Bayesian approach with 
fixed Gaussian priors studied in, e.g., [18] and [1]. In the last decade however, 
several methods have been developed in frequentist literature that achieve 
the minimax convergence rate without such knowledge. This development 
parallels the earlier work on adaptive methods for the direct nonparametric 
problem (i.e. the case p = in (1.1)) to some extent, although the inverse 
case is technically usually more demanding. The adaptive methods typically 
involve a data-driven choice of a tuning parameter in order to automati- 
cally achieve an optimal bias-variance trade-off, as in Lepski's method for 
instance. 

For nonparametric inverse problems, the construction of an adaptive esti- 
mator based on a properly penalized blockwise Stein's rule has been studied 
in [11], cf. also [5]. This estimator is adaptive both over Sobolev and analytic 
scales. In [9] the data-driven choice of the regularizing parameters is based 
on unbiased risk estimation. The authors consider projection estimators and 
derive the corresponding oracle inequalities. For \i in the Sobolev scale they 
obtain asymptotically sharp adaptation in a minimax sense, whereas for fx 
in analytic scale, their rate is optimal up to a logarithmic term. Yet another 
approach to adaptation in inverse problems is the risk hull method studied 
in [10]. In this paper the authors consider spectral cut-off estimators and 
provide oracle inequalities. An extension of their approach is presented in 
[20] . The link between the penalized blockwise Stein's rule and the risk hull 
method is presented in [21]. 

Adaptation properties of Bayes procedures for mildly ill-posed nonpara- 
metric inverse problems have until now not been studied in the literature. 
Results are only available for the direct problem, i.e. the case that Ki = 1 



(1.2) 



c -1 r p < ^ < cr p 
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for every i, or, equivalently, p = in (1.2). In the paper [4] it is shown that 
in this case adaptive Bayesian inference is possible using a hierarchical, con- 
ditionally Gaussian prior. Other recent papers also exhibit priors that yield 
rate-adaptive procedures in the direct signal-in-white-noise problem (see for 
instance [28], [12], [25]), but it is important to note is that these papers use 
general theorems on contraction rates for posterior distributions (as given in 
[15] for instance) that are not suitable to deal with the truly ill-posed case 
in which fcj — > as i — > oo. The reason is that if these general theorems are 
applied in the inverse case, we only obtain convergence rates relative to the 
norm \i i— > which is not very interesting. Obtaining rates relative 

to the £2-noim is much more involved and requires a different approach. 

To obtain rate-adaptive Bayes procedures for the model (1.1) we consider 
a family (II a : a > 0) of Gaussian priors for the parameter \x. These priors are 
indexed by a parameter a > which quantifies the "regularity" of the prior 
n a (details in Section 2). Instead of choosing a fixed value for a (which is the 
approach studied in [18]) we view it as a tuning-, or hyper-parameter and 
consider two different methods for selecting it in a data-driven manner. The 
approach typically preferred by Bayesian statisticians is to endow the hyper- 
parameter with a prior distribution itself. This results in a full, hierarchical 
Bayes procedure. The paper [4] follows the same approach in the direct 
problem. We prove that under a mild assumption on the hyper-prior on a, we 
obtain an adaptive procedure for the inverse problem using the hierarchical 
prior. Optimal convergence rates are obtained (up to lower order factors), 
uniformly over Sobolev and analytic scales. 

A second approach we study consists in first "estimating" a from the 
data an then substituting the estimator a n for a in the posterior distribu- 
tion for fjL corresponding to the prior n a . This empirical Bayes procedure 
is not really Bayesian in the strict sense of the word. However, for compu- 
tational reasons empirical Bayes methods of this type are widely used in 
practice, making it relevant to study their theoretical performance. Rigor- 
ous results about the asymptotic behavior of empirical Bayes selectors of 
hyper-parameters in infinite-dimensional problems only exist for a limited 
number of special problems, see e.g. [3], [30], [16]. In this paper we prove 
that the likelihood-based empirical Bayes method that we propose has the 
same desirable adaptation and rate-optimality properties in nonparametric 
inverse problems as the hierarchical Bayes approach. 

The estimator a n for a that we propose is the commonly used likelihood- 
based empirical Bayes estimator for the hyper-parameter. Concretely, it is 
the maximum likelihood estimator for a in the model in which the data is 
generated by first drawing \i from II a and then generating Y = (Y±, Y2, ■ ■ ■) 
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according to (1.1), i.e. 

(1.3) /i|a~n a , and Y\ (fi, a) ~ Nfiiifa, - 

i=i n 

A crucial element in the proof of the adaptation properties of both pro- 
cedures we consider is understanding the asymptotic behavior of a n . In 
contrast to the typical situation in parametric models (see [24]) this turns 
out to be rather delicate, since the likelihood for a can have complicated 
behavior. We are able however to derive deterministic asymptotic lower and 
upper bounds for a n . In general these depend on the true parameter fiQ in 
a very complicated way. To get some insight into why our procedures work 
we show that if the true parameter has nice regular behavior of the form 
A*o,i ~ i~ l l 2 ~P for some (3 > 0, then a n is essentially a consistent estimator 
for f3 (see Lemma 2.1). This means that in some sense, the estimator a n 
correctly "estimates the regularity" of the true parameter (see [3] for work 
in a similar direction). Since the empirical Bayes procedure basically chooses 
the data-dependent prior Ha n for /i, this means that asymptotically, the pro- 
cedure automatically succeeds in selecting among the priors n a , a > 0, the 
one for which the regularity of the prior and the truth are matched. This re- 
sults in an optimal bias- variance trade-off and hence in optimal convergence 
rates. 

The remainder of the paper is organized as follows. In Section 2 we first 
describe the empirical and hierarchical Bayes procedures in detail. Then we 
present a theorem on the asymptotic behavior of estimator a n for the hyper- 
parameter, followed by two results on the adaptation and rate of contraction 
of the empirical and hierarchical Bayes posteriors over Sobolev and analytic 
scales. The two approaches are illustrated numerically in Section 3. We apply 
them to simulated data from an inverse signal-in-white-noise problem, where 
the problem is to recover a signal from a noisy observation of its primitive. 
Proofs of the main results are presented in Sections 4-7. Some auxiliary 
lemmas are collected in Section 8. 

1.1. Notation. For f3 > and 7 > 0, the Sobolev norm the analytic 
norm ||/^||v1t and the ^2-norm \\fJt\\ of an element \x G £2 are defined by 

00 00 00 

\H} = E ini 2 = E ft 11 A = E e27 v 2 > 

i=l i=l i=l 

and the corresponding Sobolev space by S 13 = {fi G £2- HHI/3 < °°}> an d the 
analytic space by A 1 = {fi £ £2: ||m|Ut < 00}. 
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For two sequences (a n ) and (b n ) of numbers, a n x b n means that \a n /b n \ 
is bounded away from zero and infinity as n — > oo, a n < b n means that 
a n /b n is bounded, a n ~ b n means that a n /b n — >• 1 as n — > oo, and a n <C b n 
means that a n /b n — > as n — > oo. For two real numbers a and b, we denote 
by a V b their maximum, and by a A b their minimum. 

2. Main results. 

2.1. Description of the empirical and hierarchical Bayes procedures. We 
assume that we observe the sequence of noisy coefficients Y = (Yi, Y%, . . .) 
satisfying (1.1), for Z\, Z%, ■ ■ ■ independent, standard normal random vari- 
ables, [i = (fj,\, fi2, ■ ■ ■) £ and a known sequence (k^) satisfying (1.2) 
for some p > and C > 1. We denote the distribution of the sequence 
Y corresponding to the "true" parameter fiQ by Po, and the corresponding 
expectation by Eo. 

For a > 0, consider the product prior II Q on £2 given by 

00 

(2.1) n a = (g)iv(o,r 1 - 2a ). 

i=l 

It is easy to see that this prior is "a-regular", in the sense that for every 
a' < a, it assigns mass 1 to the Sobolev space S a . In [18] it was proved that 
if for the true parameter /j,q we have hq G S" for f3 > 0, then the posterior 
distribution corresponding to the Gaussian prior II a contracts around /jlq at 
the optimal rate n~^/( 1+2 ^ +2p ) if a = f3. If a ^ /3, only sub-optimal rates 
are attained in general (cf. [6]). In other words, when using a Gaussian prior 
with a fixed regularity, optimal convergence rates are obtained if and only 
if the regularity of the prior and the truth are matched. Since the latter 
is unknown however, choosing the prior that is optimal from the point of 
view of convergence rates is typically not possible in practice. Therefore, we 
consider two data-driven methods for selecting the regularity of the prior. 

The first is a likelihood-based empirical Bayes method, which attempts 
to estimate the appropriate value of the hyper-parameter a from the data. 
In the Bayesian setting described by the conditional distributions (1.3), it 
holds that 

00 ^ 
Fla-CgJV^r^/c^ + i). 

i=l 

The corresponding log-likelihood for a (relative to an infinite product of 
iV(0, l/n)-distributions) is easily seen to be given by 

1 00 2 

(2 .2) = -iyj(i„ g (i + ^) - 5*4^) • 

1 = 1 * 2 
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The idea is to "estimate" a by the maximizer of £ n . The results ahead 
(Lemma 2.1 and Theorem 2.2) imply that with Po-probability tending to 
one, £ n has a global maximum on [0, logn) if //o,i 7^ for some i > 2. (In 
fact, the cited results imply the maximum is attained on the slightly smaller 
interval [0, (log n)/(2 log 2) — 1/2—p)]). If the latter condition is not satisfied 
(if no = for instance), £ n may attain its maximum only at oo. Therefore, 
we truncate the maximizer at logn and define 

a n = argmax £ n {a). 
oG[0,log n] 

The continuity of £ n ensures the argmax exists. If it is not unique, any value 
may be chosen. We will always assume at least that [i§ has Sobolev regularity 
of some order /3 > 0. Lemma 2.1 and Theorem 2.2 imply that in this case 
a n > with probability tending to 1. An alternative to the truncation of 
the argmax of £ n at log n could be to extend the definition of the priors LIq, 
to include the case a = oo. The prior II^ should then be defined as the 
product N(0, 1) (8> So <8> So • • • , with So the Dirac measure concentrated at 
0. However, from a practical perspective it is more convenient to define a n 
as above. 

The empirical Bayes procedure consists in computing the posterior dis- 
tribution of fx corresponding to a fixed prior Il a and then substituting a n 
for a. Under the model described above and the prior (2.1) the coordinates 
(Ho,i,Yi) of the vector (fj,Q,Y) are independent, and hence the conditional 
distribution of fio given Y factorizes over the coordinates as well. The com- 
putation of the posterior distribution reduces to countably many posterior 
computations in conjugate normal models. Therefore (see also [18]) the pos- 
terior distribution corresponding to the prior II Q is given by 

oo _i _2 

(2.3) U a (-\Y) = ®W( "*% Y, — Ki 

Then the empirical Bayes posterior is the random measure Ha n (- \ Y) defined 
by 

(2.4) U & JB\Y) = U Q (B\Y) 

for measurable subsets B C 1%. Note that the construction of the empirical 
Bayes posterior does not use information about the regularity of the true 
parameter. In Theorem 2.3 below we prove that it contracts around the 
truth at an optimal rate (up to lower order factors), uniformly over Sobolev 
and analytic scales. 
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The second method we consider is a full, hierarchical Bayes approach 
where we put a prior distribution on the hyper-parameter a. We use a prior 
on a with a positive Lebesgue density A on (0, oo). The full, hierarchical 
prior for [i is then given by 

(2.5) 11= / X(a)U a da. 

Jo 

In Theorem 2.5 below we prove that under mild assumptions on the prior 
density A, the corresponding posterior distribution n(-|Y) has the same 
desirable asymptotic properties as the empirical Bayes posterior (2.4). 

2.2. Adaptation and contraction rates. Understanding of the asymptotic 
behavior of the maximum likelihood estimator a n is a crucial element in 
our proofs of the contraction rate results for the empirical and hierarchical 
Bayes procedures. The estimator somehow estimates the regularity of the 
true parameter fj,Q, but in a rather indirect and involved manner in general. 
Our first theorem gives deterministic upper and lower bounds for a n , whose 
constructing involves the function h n : (0, oo) — > [0, oo) defined by 

1 + 2a + 2p ^ n 2 i 1+2a ^ .logi 

nW n l/(l+2 a +2 P ) 1(Jg n ( i l+2a K -2 + n )2 • 

For positive constants < I < L we define the lower and upper bounds as 



(2.7) a n = inf{a > 0: h n {a) > 1} A \/\ogn, 

(2.8) a n = inf{a > 0: h n (a) > L(log n) 2 }. 

One can see that the function h n and hence the lower and upper bounds a n 
and a n depend on the true We show in Theorem 2.2 that the maximum 
likelihood estimator a n is between these bounds with probability tending to 
one. In general the true hq can have very complicated tail behavior, which 
makes it difficult to understand the behavior of the upper and lower bounds. 
If no has regular tails however, we can get some insight in the nature of the 
bounds. We have the following lemma, proved in Section 4. 

Lemma 2.1. For any I, L > in the definitions (2.7)-(2.8) the following 
statements hold. 

(i) For all (3, R > 0, there exists Co > such that 

inf a > (3 — °° 
Ilwl|/3<R logra 

for n large enough. 



BAYESIAN ADAPTATION IN INVERSE PROBLEMS 



9 



(ii) For all 7, R > Q, 

. , . Vlogn 

ml a n > : 

|[W)|U7<-R log log n 

for n large enough. 
(Hi) If /io,i > ci -7 " 1 / 2 for some c, 7 > 0, then for a constant Cq > on/y 
depending on c and 7, we Ziaue a n < 7 + Co (log log n) /log n for all n 
large enough. 

(iv) If fiQ y i 7^ for some i > 2, then a n < (logn) / (2 log 2) — 1/2 — p for n 
large enough. 

We note that items (i) and (iii) of the lemma imply that if /xo,i x i~ 1//2-/3 , 
then the interval [a n ,a n ] concentrates around the value (3 asymptotically. 
In combination with Theorem 2.2 this shows that at least in this regular 
case, a n correctly estimates the regularity of the truth. The same is true in 
the analytic case, since item (ii) of the lemma shows that a n — > 00 in that 
case, i.e. asymptotically, the procedure detects the fact that fio has infinite 
regularity. 

Item (iv) implies that if /xo,i 7^ for some i > 2, then a n < 00 for large n. 
Conversely, the definitions of h n and a n show that if /xo,i = for all % > 2, 
then h n = and hence a n = 00. 

The following theorem asserts that the point (s) where £ n is maximal 
is (are) asymptotically between the bounds just defined, uniformly over 
Sobolev and analytic scales. The proof is given in Section 5. 

Theorem 2.2. For every R > the constants I and L in (2.7) and (2.8) 
can be chosen such that 

inf Po( argmax£ n (a) E [a n ,a n ]) — > 1, 
MoeS(-R) V a >o > 

where B(R) = {u E £ 2 : ||/i ||/3 < R} or B(R) = {/x E t 2 : \^\ai < R}- 

With the help of Theorem 2.2 we can prove the following theorem, which 
states that the empirical Bayes posterior distribution (2.4) achieves optimal 
minimax contraction rates up to a slowly varying factor, uniformly over 
Sobolev and analytic scales. 

Theorem 2.3. For every /3, 7, R > and M n — > 00 we have 

sup E n a „(||/i - /i || > M n L n n~^ 1+2 ^ \ Y) -> 

||j"o|||9<^ 
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and 

sup E n an (||^ - Moll > MnLnilognf'^n- 1 ' 2 \ Y) -»■ 0, 

Hmo|Ut<-R 

where (L n ) is a slowly varying sequence. 

So indeed we see that both in the Sobolev and analytic cases, we 

obtain the optimal minimax rates up to a slowly varying factor. The 

proofs of the statements (given in Section 6) show that in the first case 

we can take L n = (logn) 3//2 (loglogn) 1//2 and in the second case L n = 
(logn)( 1 /2+p)v / I^/2+i-p( loglogn )i/2 > Thege 

sequences converge to infinity 
but they are slowly varying, hence they converge slower than any power of 
n. 

The full Bayes procedure using the hierarchical prior (2.5) achieves the 
same results as the empirical Bayes method, under mild assumptions on the 
prior density A for a. 

Assumption 2.4. Assume that for every c\ > there exist 02,03 > 
and C4 > 1 such that 

c^oT^ < A(a) < C4a _C2 or cj 1 exp(— C30A < A(a) < C4exp(— 03a) 

for a > c\ . 

One can see that a many distributions satisfy this assumption. Careful 
inspection of the proof of the following theorem, given in Section 7, can lead 
to weaker assumptions, although these will be less attractive to formulate. 
Recall the notation n(-| Y) for the posterior corresponding to the hierarchical 
prior (2.5). 

Theorem 2.5. Suppose the prior density A satisfies Assumption 2-4- 
Then for every /3, 7, R > and M n — > 00 we have 

sup E U(\\fi - /xoll > M n L n n-^ 1+2 ^ \ Y) -> 
\\vo\\p<R 

and 

sup E n(||// - /xo|| > M^logn) 1 /^- 1 / 2 | Y) -)• 0, 

\\vo\\ai<R 

where (L n ) is a slowly varying sequence. 

The hierarchical Bayes method thus yields exactly the same rates as the 
empirical method, and therefore the interpretation of this theorem is the 
same as before. 
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3. Numerical illustration. Consider the inverse signal-in-white-noise 
problem where we observe the process (Yf.t 6 [0, 1]) given by 



with W a standard Brownian motion, and the aim is to recover the function 
\x. If, slightly abusing notation, we define Y\ = e,(i) dY t , for ei the or- 
thonormal basis functions given byei(t) = \/2cos((i-l/2)7rt), then it is eas- 
ily verified that the observations Yj satisfy (1.1), with nf = ((i — l/2) 2 7r 2 ) -1 , 
i.e. p = 1 in (1.2), and m the Fourier coefficients of [i relative to the basis 

We consider simulated data from this model for /io the function with 
Fourier coefficients fiQ^ = i~ 3 / 2 sin(i), so we have a truth which essentially 
has regularity 1. In the following figure we plot the true function /iq (black 
curve) and the empirical Bayes posterior mean (red curve) in the left panels, 
and the corresponding normalized likelihood exp(^ n )/ max(exp(£ n )) in the 
right panels (we truncated the sum in (2.2) at a high level). Figure 1 shows 
the results for the empirical Bayes procedure with simulated data for n = 
10 3 , 10 5 , 10 7 , 10 9 , and 10 11 , from top to bottom. The fi gure shows that the 
estimator a n does a good job in this case at estimating the regularity level 
1, at least for large enough n. We also see however that due to the ill— 
posedness of the problem, a large signal-to-noise ratio n is necessary for 
accurate recovery of the function \x. 

We applied the hierarchical Bayes method to the simulated data as well. 
We chose a standard exponential prior distribution on a, which satisfies 
Assumption 2.4. Since the posterior can not be computed explicitly, we 
implemented an MCMC algorithm that generates (approximate) draws from 
the posterior distribution of the pair (a, /u). More precisely, we fixed a large 
index J G N and defined the vector [i J = (fii, . . . , fij) consisting of the first 
J coefficients of [i. Then we devised a Metropolis-within-Gibbs algorithm for 
sampling from the posterior distribution of (a, (e.g. [27]). The algorithm 
alternates between draws from the conditional distribution a,Y and the 
conditional distribution a\ fi J , Y. The former is explicitly given by (2.3). To 
sample from a\ ^, Y we used a standard Metropolis-Hastings step. It is easily 
verified that the Metropolis-Hastings acceptance probability for a move from 
(a,//) to (a' , n) is given by 




1 A 



q(a'\ a)p{fi J \ a')A(a') 
q(a\ a')p(iJL J \ a)A(a) 



12 



KNAPIK, SZABO, VAN DER VAART AND VAN ZANTEN 




0.0 0.2 04 06 08 10 1 2 3 4 5 



Fig 1. Left panels: the empirical Bayes posterior mean (red) and the true curve (black). 
Right panels: corresponding normalized likelihood for a. We have n = 10 3 , 10 5 , 10 7 , 10 9 , 
and 10 11 , from top to bottom. 

where p(- \ a) is the density of [i J if \i ~ H a , i.e. 

p0i J |a)ocni 1/2+O1 e-^ 1+2a ^, 

3=1 

and q is the transition kernel of the proposal chain. We used a proposal chain 
that, if it is currently at location a, moves to a new N(a, <7 2 )-distributed 
location provided the latter is positive. We omit further details, the imple- 
mentation is straightforward. 

The results for the hierarchical Bayes procedure are given in Figure 2. 
The figure shows the results for simulated data with n = 10 3 , 10 5 , 10 7 , 10 9 
and 10 11 , from top to bottom. Every time we see the posterior mean (in red) 
and the true curve (black) on the left and a histogram for the posterior of a 
on the right. The results are comparable to what we found for the empirical 
Bayes procedure. 

4. Proof of Lemma 2.1. In the proofs we assume for brevity that 
we have the exact equality Ki = i~ p . Dealing with the general case (1.2) is 
straightforward, but makes the proofs somewhat lengthier. 




Fig 2. Left panels: the hierarchical Bayes posterior mean (red) and the true curve (black). 
Right panels: histograms of posterior for a. We have n = 10 3 , 10 5 , 10 7 , 10 9 , and 10 11 from 
top to bottom. 



(i). We show that for all a < (3 — co/logn, for some large enough constant 
Co > that only depends on ||//o[|/8j it holds that h n (a) < I, where I is the 
given positive constant in the definition of a n . 

The sum in the definition (2.6) of h n can be split into two sums, one over 
indices i < n l ^ l+2a+2p ^ and one over indices i > n 1 ^ 1+2a+2p - ) . The second 
sum is bounded by 

i>n 1 /(l + 2a + 2p) 

Since the function x h-> x _7 logx is decreasing on [e 1 / 7 ,oo), this is further 
bounded by 

II Il2 

||//0||fl l + 2q-2ff 
tL n l+2c*+2p log n 

l + 2a + 2p B 
The sum over i < n 1 ^ 1+2a+2p ^ is upper bounded by 

J2 * 1 + 2 «- 2 ^/4io g , 

i<n 1 /(l + 2a + 2p) 

Since the logarithm is increasing we can take (logn)/(l + 2a + 2p) outside 
the sum and then bound i 1 + 2a ~ 2 / 3 above by n ( 1 +2a-2/3)/(i+2a+2p)vo to arr i ve 
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at the subsequent bound 

II//0III 0v i+2 a - 2) 3 

1 n i+2q+2 P log n. 

1 + 2a + 2p & 

Combining the bounds for the two sums we obtain the upper bound 

lAgCg-q) 
M a ) < llMoll^ 1+2-+2P , 

valid for all a > 0. Now suppose that a < /3 — co/logn. Then for n large 
enough, the power of n on the right-hand side is bounded by 

lA2(e /logn) 



fl l+2a + 2p = g l + 2a + 2p _ 

Hence given I > we can choose cq so large, only depending on ||/io||/3> that 
h n (a) < I for a < (3 — co/logn. 

(ii). We show that in this case we have h n {a) < I for a n > 
y / Iogn/(log log n) and n > uq, where uq only depends on ||^o|Ut- Again 
we give an upper bound for h n by splitting the sum in its definition into two 
smaller sums. The one over indices i > n 1 ^ 1+2a+2p - ) is bounded by 



n 2 



£ r^-^e-^iog*)^. 

j >n l/(l + 2Q+2p) 



Using the fact that for 8 > the function x i— > x s e 2 ^ x log x is decreasing 
on [e 1 / 5 , oo) we can see that this is further bounded by 

IIW)||i-y _ 2 7n 1 /(l+2a+2p) l+ 2 " 

e ' n 1+2a+2 plogn. 

1 + 2a + 2p B 

The sum over indices i < n 1 /( 1 + 2a! + 2 P) is bounded by 

logn 



l + 2a + 2p P0 ' 1 

i<n 1 /(l+2c+2p) 

Since the maximum on (0, oo) of the function x i— >■ x 1+2a exp( — 27) equals 
exp((l + 2a)(log((l + 2a) /2j) — 1)), we have the subsequent bound 

II/^oIIa-t e (l+2a) log((l+2a)/2 7 ) 1 
l + 2a + 2p 

Combining the two bounds we find that 

K{a) < y4%(nT+£^e- 2 ~< nTT ^ + n'T^e^ log ^) 
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for all a > 0. It is then easily verified that for the given constant I > 0, we 
have h n {a) < I for n > uq if a < \f\ogn/ log log ra, where no only depends 
on H/iolU^. 

(iii) . Let 7n = 7 + Co (log log n) / (log n) . We will show that for n large 
enough, h n (^ n ) > L(logn) 2 , provided Co is large enough. Note that 

oo 2-l+2 7 „ 2 l ■ o 

^ g > - V e^hozi 

i = l V ' j<n 1 /(l + 27n+2p) 

By monotonicity and the fact that [^J > x/2 for x large, the sum on the 
right is bounded from below by the integral 

n l/(l+2 7n +2p) j 2 

I x 27 ™ -27 \ogxdx. 

Jo 

This integral can be computed explicitly and is for large n bounded from 
below by a constant times 

log n 2 7n -2 7 +i 

- n l+2 7 n+2p . 



1 + 2 7n + 2p 



It follows that, for large enough n, /i n (7n) is bounded from below by a 
constant times c 2 n 2 ( 7n_7 ^( 1+27n+2p \ Since (log log n)/(log n) < 1/4 for n 
large enough, we obtain 



n 2( 7n -7)/(l+2 7ll +2p) > n tok (1 ° glogn) i+2 7 +cg/2+2 P = (l g n )2C„/(l+2 7 +Co/2+2p)_ 



Hence for Co large enough, only depending on c and 7, we indeed have that 
and /i n (7 n ) > L(logn) 2 for large n. 
(iv). If uo,j 7^ for i > 2, then 

, . . . 1 + 2a + 2p n 2 i 1+2a 
/i n (a) > 



~ n l/(l+2a+2p) logn (^l+2a+2p + n )2 ' 

Now define a n such that i 1 + 2a «+ 2 P = Then by construction we have 
h n (a n ) > n l-i/(l+2an+2p)_ gi nce an _). qq the right side is larger than 
Llog 2 n for n large enough, irrespective of the value of L, hence a n < a n < 
(logn)/(21og2) - 1/2 -p. 

5. Proof of Theorem 2.2. With the help of the dominated conver- 
gence theorem one can see that the random function £ n is (Po — a.s.) differ- 
entiate and its derivative, which we denote by M n , is given by 

OO , . OO ') . I _ o, v — 2 1 

m (q)-v nlogl y nV K * log V 2 

M j " ^ il+ 2 -K- 2 + n (^+ 2 ^r 2 + n ) 2 1 ' 
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We will show that on the interval (0,a n + 1/logn] the random function 
M n is positive and bounded away from with probability tending to one, 
hence t n has no local maximum in this interval. Next we distinguish two 
cases according to the value of a n . If a n = oo, then the inequality a n < a n 
trivially holds. In the case a n < oo we show that the integral of M n over 
the interval \a n , oo) is a.s. upper bounded by a fixed positive constant times 
n 1 /( 1+2an+2p )(logn) 2 /(l + 2a n + 2p). Then we prove the constant L can 
be set such that on the interval [a n — 1/logn, a n ] it is bounded by an 
arbitrary negative constant times n 1 /( 1+2an+2p )(logn) 3 /(l + 2a n + 2p) with 
probability tending to one uniformly, hence the integral of M n over this 
interval is bounded above by an arbitrarily large negative constant times 
n 1 /( 1+2an+2p )(logn) 2 /(l + 2a n + 2p). This means that on the interval [a n — 
1/logn, a n ] the function £ n (a) decreases more, than it can possibly increase 
on the interval [a n ,oo). Therefore, it holds with probability tending to one 
that l n has no global maximum on (a n — 1/logn, oo). 

We only present the details of the proof for the case that fi £ Si 3 . The 
case ^0 £ can be handled along the same lines. Again for simplicity we 
assume Ki = i~ p in the proof. 

5.1. M n (a) on [a n ,oo). In this section we give a deterministic upper 
bound for the integral of M n (a?) on the interval \a n , 00). We can restrict to 
the case that /io,i f° r some % > 0, since otherwise a n = 00. By Lemma 
2.1, we have a n < a n in this case, where a n = (logn)/(21og2) — 1/2 —p. 

We have the trivial bound 

00 , 
, . v-^ nlog? 
M n (a) < v 



/ -i jl+2a+2p _|_ n ' 
i=l 

An application of Lemma 8.1 . (i) with r = 1 + 2a + 2p and c = (5 + 2p shows 
that for P/2 < a < a n , 

MJa) < 1 n^+^+^logn. 

v ; ~ 1 + 2a + 2p & 

For a > a n we apply Lemma 8.1.(ii), and see that M n (a) < n2 _1 ~ 2a ~ 2p . 
Using the fact that x 1— > 2~ x x s is decreasing for large x, it is easily seen that 
n2 -i-2a-2 P < (l og 3 n )/(i + 2a + 2p) 3 ) for a > a n , hence 

M n (a) < , lQg3 " 
nK ' ~ (1 + 2a + 2 P y 
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By Lemma 2.1 we have (5/2 < a n for large enough n. It follows that the 
integral we want to bound is bounded by a constant times 

n i/(i+2a n +2 P ) j / da + log 3 n / -=da. 

h n l + 2a + 2p S J an (l + 2a + 2p)3 

This quantity is bounded by a constant times 

n l/(l+2a n +2p) 1 2 n 



1 + 2a n + 2p 



5.2. M n (a) on a E [a n — 1/logn, a n ]. In this section we show that 
the process M n (Q) is with probability going to one smaller than a negative, 
arbitrary large constant times n 1 /( 1+2an+2p )(logn) 3 /(l + 2a n + 2») uniformly 
on the interval \a n — 1/logn, a n ]. More precisely, we show that for every 
(5, R, M > 0, the constant L > in the definition of a n can be chosen such 
that 



(1 + 2a + 2p)M n (a) 

Cv() 

n->-oo \\fi \\ g <Rae[a n -l/\ogn,a n ] 

l + 2a + 2p)|M n (a) - E M n (a) 



(.-,.1) hmsup sup sup E nl/(1+2a+2p)(logn)3 <~ M 



(5 ' 2) II T< R E ° Pl - -1 n V(l+2a+2p) (loKn) 3 

The expected value of the normalized version of the process M n given on 
the left-hand side of (5.1) is equal to 

i , o , n 00 2 i 00 ^,2-l+2a,,2 i„„„- 

l + 2a + 2p n 2 logz ^ nz^ /^logj 

[ ' n l/(l+2a+2p) n Q „ n )3 { Z> (^l+2«+2p + n )2 1^ (^l+2a+2p + n )2 

We write this as the sum of two terms and bound the first term by 

1 + 2a + 2p ^\ n log i 

n l/(l+2a+2p)( logn )3 2-s i l+2a+2p + n - 

By Lemma 8.1.(i), for all a > c/2 — p, where c is an arbitrary positive 
constant, this can be further bounded by a multiple of 1/ (log n) 2 . By Lemma 
2.1, (5/4 < a n — 1/ logn for large enough n, hence by choosing c = (5/2+2p we 
get that the right hand side of the preceding display tends to zero uniformly 
over \a n — l/logn,oo). We now consider the second term in (5.3), which is 
equal to h n (a) / {log n) 2 . By Lemma 5.1 for any S £2 and n > e 4 we have 

> ^h n (a n ) = L, 



(logn) 2 ~ (logn) 2 
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where the last equality holds by the definition of a n . This concludes the 
proof of (5.1). 

To verify (5.2) it suffices, by Corollary 2.2.5 in [29] (applied with ip{x) = 
x 2 ), to show that 

(l + 2a + 2p)M w (g) n 
( } „ T<r ,r W - i ° nV(H^)(lo RW )3 ^ °' 

llMoll/3<-Rae[a„-l/logn,a„] V lu 6 "7 

and 

/•diam n 

sup / yjN(e, [a n - I / log n,a n ], d n ) de -> 0, 
||mo||^<-R-/o 

where cf n is the semimetric defined by 

2 / (1 + 2ai + 2p)M w (ai) (1 + 2a 2 + 2p)M n (a 2 ) \ 

d„(ai,a 2 J var ^ nl/(1+2ai+2 p)( logn )3 n i/(i+2a 2 +2 P )( logn )3 J' 

diam„ is the diameter of [a n — 1/ log n, a n ] relative do d n , and N(e, B, d) is 
the minimal number of d-balls of radius e needed to cover the set B. 
By Lemma 5.2 

(l + 2a + 2ff)M n (q) n -i/(i+2°+2 P ) 

(5 ' 5) Var ° n l/(l + 2a + 2p) (]ogn) 3 ^ ( lo gn) 4 t 1 + M°0), 

(with an implicit constant that does not depend on //o an d a). By the 
definition of a n the function h n (a) is bounded above by L(logn) 2 on the 
interval [a n — 1/ log n, a n ] . Together with (5.5) it proves (5.4). 

The last bound also shows that the <i n -diameter of the set \a n — 
1/logn, a n ] is bounded above by a constant times (logn)^ 1 , with a con- 
stant that does not depend on [Aq and a. By Lemma 5.3 and the fact 
that h n (a) < L(logn) 2 a G [a n — 1/logn, a n ), the upper bound, ai,a 2 G 
[a„ - 1/ log n,a n ], 

d n (ai,a>2) < |ai - a 2 \, 

with a constant that does not depend on /io- Therefore iV(e, [a n — 
1/ log n, S„] , d n ) < l/(elogn) and hence 

sup / \/N(e, [a n - 1/ logn, a n ), d n ) de < > 0. 

\\wh<RJo ' lo § n 
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5.3. M n (a) on (0,a n + 1/logn]. In this subsection we prove that if the 
constant I in the definition of a n is small enough, then 

(5.6) liminfinf inf E, ' 1 + + 2 fj M "<°> > 
7 , F (l + 2a + 2p)|M w (a)-E M w (a)| 

(5.7) SUp E SUp l/ri+2a+2p) 1 > 0> 

This shows that M n is positive throughout (0, a n + 1/ log n] with probability 
tending to one uniformly over £2- 

Since EoY^ 2 = wfjUoi + V n > the expected value on the left-hand side of 
(5.6) is equal to 

l + 2a + 2p ^ n 2 logi 

1 J n l/(l+2a+2p) logn (jl+to+Sp + n )2 n «V«J- 

We first find a lower bound for the first term. Since a n < ^logn by defini- 
tion, we have a <C logn for all a £ (0,a n + 1/logn]. Then it follows from 
Lemma 8.3 that for n large enough, the first term in (5.8) is bounded from 
below by 1/12 for all a £ (0,a n + 1/logn]. Next note that by definition of 
h n and Lemma 5.1, we have 

sup h n (a) < Kl, 

oe(0,a n +l/logn] 

where K > is a constant independent of /iq. So by choosing I > small 
enough, we can indeed ensure that (5.6) is true. 

To verify (5.7) it suffices again, by Corollary 2.2.5 in [29] applied with 
ip(x) = x 2 , to show that 

, K q\ (l + 2a + 2p)M w (q) 
(5.9) sup sup var — 1/(1+2a+2p) 1 > °> 

and 

/•diam n 

sup / \/N(e, (0,a n + l/\ogn],d n ) de -> 0, 

where d n is the semimetric defined by 

2 / (1 + 2oi + 2p)M n (ai) (1 + 2a 2 + 2p)M n (a 2 ) \ 

a^ai,a 2 j var ^ n i/(i+2ai+2p) i ogn n i/(i+2 Q2 +2 P ) logn J' 

diam n is the diameter of (0,a n + 1/logn] relative to d n , and N(e,B,d) is 
the minimal number of (i-balls of radius e needed to cover the set B. 
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By Lemma 5.2 

(«°) |1 ;:::s e ( ; > <~ + *.(-», 

with a constant that does not depend on /iq and a. We have seen that 
on the interval (0, a n + 1/logn] the function h n is bounded by a con- 
stant times I, hence the variance in (5.9) is bounded by a multiple of 
ri -i/(i+2a n +2/iogn+2 P ) < e -(i/3)VEgH _^ q, which proves (5.9). 

The variance bound above also imply that the d n -diameter of the set 
(0,a n + 1/logn] is bounded by a multiple of e _( - 1 ' /6 - >v/logn . By Lemma 5.3, 
the definition of a„ and Lemma 5.1, 



d n (ai,a 2 ) < \a x - a 2 |(logn) Vn- 1 /(i+2a n +2/logn+2 P ) < | Ql 



«2 , 



with constants that do not depend on no- Hence for the covering number of 
(0, a n + 1/ log n] C (0, 2 v / log n) we have 



iV(e, (0, a„ + 1/ log n] , d n ) < , 



and therefore 

pdiam n 



sup 

MO 6^2 ^0 



VATM0,a n + l/logn],d n )<fe < (logn) 1 /^^ 1 / 12 )^ -> 0. 



5.4. Bounds on h n (a), variances and distances. In this section we prove 
a number of auxiliary lemmas used in the preceding. The first one is about 
the behavior of the function h n in a neighborhood of a n and a n . 



Lemma 5.1. The function h n satisfies the following bounds: 
h n (a) > K(a n ), foraG 
h n {a) < h n (a n ), for a £ 



1 

log n 



and n > e 4 , 



? — n 



logn 



and n > e 2 . 



Proof. We provide a detailed proof of the first inequality, the second 
one can be proved using similar arguments. 
Let 

^n 2 i 1+2a fi 2 Qi logi 



El L t 
771 
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be the sum in the definition of h n . Splitting the sum into two parts we get, 
for a £ \a n — 1/ log n,a n ], 

45 n (a)> Yl i 1+2 ""- 2/log >^logi 

i<n 1 /(l + 2a + 2p) 

+ n 2 £ r^-^iogi 

j >n l/(l + 2a + 2p) 

In the first sum i~ 2 / lo s n can b e bounded below by exp(— 2). Furthermore, 

for % £ [ n V(H-25n+2p) )n l/(l+2a+2p)] ) wg haye thg inequa l ity 

Therefore S*„(a) can be bounded from below by a constant times 

£ ^^logz + n 2 £ l - 1 - 2 ^- 4 V 2 M log, 

j< n l/(l+2a„+2p) j >n l/(l+2a„+2p) 

2-l+2a„„2 T ,2,-l+2a„,,2 



— M+2Q„+2p i n )2 + (^l+2a„+2p i ^2 ' 

j<?l 1 /(l+2an+2p) V ' j >n l/(l+2a n +2p) ^ ' 

Hence, we have S n {a) > S n (a n ) for a S [a n — 1/logn, a n ]. 

Next note that for n > e we have 2(l + 2a n — 2/logn+2p) > l + 2a n +2p. 
Moreover, n -i/(i+2a„-2/io g n+2p) > n -i/(i+2a„+2 P )_ Therefore 

l + 2a + 2p > 1 + 2a n + 2p 



n l/(l+2a+2p) logn ~ n l/(l+2a„+2p) logrl 

for a G [a n — 1/ log n, a„] and for n > e 4 . Combining this with the inequality 
for S n (a) yields the desired result. ■ 

Next we present two results on variances involving the random function 



Lemma 5.2. For any a > 0, 

™» (1+ n v (1 + + S l(a) S n-W«-* W(l + ».(„)). 

PROOF. The random variables l^ 2 are independent and varo = 2/n 2 + 
4k 2 ^q i( /n, hence the variance in the statement of the lemma is equal to 

2n 2 (l + 2a + 2p) 2 ^ i 2+4a+4p (log -)2 

n 2/(l+2a+2p) 2-> ^l+2a+2p + n )4 

(5 ' 11} 4n 3 (l + 2a + 2p) 2 " i 2+4a+2p (logQXi 

+ n 2/(l+2a+2p) 2^ ^l+2a+2p _|_ n \A ' 
i=l 
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By Lemma 8.4 the first term is bounded by 

2n(l + 2a + 2p)logn^ i 1+2a+2 P\ogi 

n 2/(l+2a+2p) Ul+2a+2p _|_ n \2 

i=l 

2(1 + 2a + 2p) log n nlogi 



< 



n 2/(l+2«+2p) Z-^ ^l+2a+2p _|_ n ' 

j=l 



Lemma 8.1.(iii) further bounds the right hand side of the above display by 
a multiple of n _1 ^ 1+2a+2p ^(logn) 2 uniformly for a > c, where c > is an 
arbitrary constant. For a < c we get the same bound by applying Lemma 8.2 
(with m = 2, I = 4, r = 1 + 2a + 2p, ro = 1 + 2c + 2p, and s = 2r) to the 
first term in (5.11). By Lemma 8.4, the second term in (5.11) is bounded by 

4„-W + ^>(l + 2a + 2p)(lo gn ) | g^jg 

= 4, r l/(l+2«+2p) (log „)2 ftn(a) ^ 

Combining the upper bounds for the two terms we arrive at the assertion of 
the lemma. ■ 

Lemma 5.3. For any < a\ < a 2 < oo we have that 

■ (1 + 2«i + 2p)M n (ai) (1 + 2a 2 + 2p)M„(a 2 ) 



varg 



n l/(l+2oi+2p) n l/(l+2a 2 +2p) 

< ( ai -a 2 ) 2 (logn) 4 sup n-W+^+Wfl + hnia)), 



ck£[cki,c12] 

with a constant that does not depend on a and 

Proof. The variance we have to bound can be written as 



n A ^(/i(ai) - fi(a 2 )) 2 var Y 2 , 
-i=i 



where /;(a) = (1 + 2a + 2p)i 1+2a+2 Pn' 1 ^ 1+2a+2 P\i 1+2a+2 P + n)" 2 . For the 
derivative of fi we have 

\f'f M 9fr ^ 1 -a. l0gn 2^+ 2 «+ 2 Plog 

l/i(")l= 2/i(a) — — — — + logz + 



l + 2a + 2p to (l + 2a + 2p) 2 ii+2«+2 P + n 
< 8/i(a) (log i + (log n)/(l + 2a + 2p) 2 ) , 
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hence the variance is bounded by a constant times 
{a\ — a2) 2 n 4 sup (1 + 2a + 2p) z 



aE[ai,oi2\ 

oo i2 44o+4p( logi )2( logi+ (lo gn )/(l + 2a + 2p) 2 ) 2 



n 2/(l+2a+2p)^l+2a+2p _|_ n \A Var ° * 

Since varo Y 2 = 2/n 2 + 4k 2 / u 2 1 i /n, it suffices to show that both 
n 2 sup (1 + 2a + 2p)' 

o£ [01,02] 

(5 ' 12) g i 2+4Q+4 P(logi) 2 (logi + (logn)/(l + 2a + 2p) 2 



n 2/(l+2a+2p)^l+2a+2p _|_ n )4 



and 



n 3 sup (1 + 2a + 2p) 2 

aG [01,0:2] 

( '''- 1:5) — ^+ 4a + 2 f(logi) 2 /x2 )i (logi + (logn)/(l + 2a + 2p) 2 ) 2 



i=i 



n 2/(l+2a+2p)^l+2o+2p _|_ ^4 



are bounded by a constant times (logn) 4 sup Qg [ Ql Q , 2 ] n 1 /( 1 + 2q + 2 p)(1 + 
fen (a)). 

By applying Lemma 8.4 twice (once the first statement with r = l+2a+2p 
and m = 1 and once the second one with the same r and m = 3 and £ = 1) 
the expression in (5.13) is seen to be bounded above by a constant times 

(logn) 3 sup (n- 2 /( 1+2 « +2 rf(l + 2a + 2p)f;^ 1+2> ^ l0g ' 



oe[oi,02] 



- (ii+2o+2 P + n y 



The expression in the parentheses equals h n {a)n~ l /( l + 2a + 2 P) logn. Now fix 
c > 0. Again, applying Lemma 8.4 twice implies that we get that (5.12) is 
bounded above by 

2n -2/(l+2o+2 P ) ~ ni l + 2a+2 Plogi 

(1 ° gn) o.T'lA l + 2a + 2p g(^ +2P + n) 2 

Using the inequality x/(x + y) < 1 and Lemma 8.1.(iii), the expression in 
the parenthesis can be bounded by a constant times n~ 1 ^ 1+2a+2p ^ logn for 
a > c. For a < c, Lemma 8.2 (with m = 2 or m = 4, I = 4, r = 1 + 2a + 2p, 
ro = 1 + 2c + 2p, and s = 2r) gives the same bound (or even a better one) 
for (5.12). The proof is completed by combining the obtained bounds. ■ 
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6. Proof of Theorem 2.3. As before we only present the details of 
the proof for the Sobolev case /io £ The analytic case can be dealt with 
similarly. Again, we assume the exact equality Ki = i~ v for simplicity. 

By Markov's inequality and Theorem 2.2, 



(6.1) 



where 



sup E n an (||/i — Moll > M n e n | Y) 
\\i*o\\p<R 

< M2 2 sup E sup R n {a) + o(l) 
R n (a) = j \\n - Mo|| 2 n Q (d/i| Y) 



is the posterior risk. We will show in the subsequent subsections that for 
e n = n _/3 /( 1+2 ^ +2p )(logn) 3 / 2 (loglogn) 1 / 2 and arbitrary M n — > oo, the first 
term on the right of (6.1) vanishes as n — > oo. Note that by the explicit 
posterior computation (2.3), we have 



% 2 V 



(6.2) R n (a) = ^(A«,i " m? + ii+2*+2 P + 



, n 

i=l t=l 

where p, a> i = ni p (i 1+2a+2p + n)~ l Yi is the ith coefhcient of the posterior 
mean. 

6.1. Bound for the expected posterior risk. In this section we prove that 
sup sup ~EoR n (a) = 0{e 2 l ). 

1 1 MO 1 1 /3 < -R a n < a< «n A log n 

To this end we define the sets 

P n = {fJ-o- \\Hq\\p < R, Mo,i ¥= for some i > 2}, 
Qn = {Mo: HmoII/3 < R, Mo,i = for all i > 2}. 

By Lemma 2.1.(iv), we have that a n < logn if hq 6 P n . Hence, it suffices to 
show that 

sup sup Eo-R n (a) = O^ 2 ), sup sup Eo-R n (a) = 0(8^). 

W)6fna„<a<a„ /toSQn a n <a<logn 

The second term of (6.2) is deterministic. The expectation of the first 
term can be split into square bias and variance terms. We find that the 
expectation of (6.2) is given by 

^ ^2+4^+4^2. ~ . 2p « {2p 



(6.3) ^2 (^l+2a+2p ±J2 +Tl ^2 (Al+2a+2p i „\2 + XT 



j (jl+2a+2p _|_ n \2 ' L~t ljl+2a+2p _j_ n \2 1 jL^i ^l+2o+2p _|_ n ' 
1 i=l i=l 
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Note that the second and third terms in (6.3) are independent of /xo and 
both bounded by 



E 



jl+2ct+2p _|_ n ' 
i=l 

By Lemma 8.2 (with m = 0, I = 1, r = 1 + 2a + 2p and s = 2p) this is for 
a > a n further bounded by 



fl l+2a+2p < fi l+2s„+2p _ 

In view of Lemma 2.1. (i) , the right-hand side is bounded by a constant times 

n -2/3/(l+2/3+2p) for ]arge n ^ 

It remains to consider the first sum in (6.3). The supremum over Q n is 
easily dealt with. If /Uo £ Q n , the first sum in (6.3) equals + n) 2 , 

whence 

For the supremum over P n we divide the first sum in (6.3) into three parts 
and show that each of the parts has the stated order. First we note that 
(6.4) 

■2+4a+4p 2 

1 V0,i ST- 2 < || |,2 -2/3/(l+2/3+2p) 

2^ ( i l+2a+2p + n )2 - V0,t S \\H0\\p n 

j>n 1 /(l+2/3+2p) V ' j >n l/(l+2/3+2p) 

Next, observe that elementary calculus shows that for a > and n > 
e, the maximum of the function i \— > i 1+2a+4p / logi over the interval 

[ 2fn l/(l+2a+2p)] is taken at i = n l/(l+2a+2p)_ j t follows that for a > 0> 



,-2+4a+4p„2 

E 

j<n 1 /(l+2a + 2p) 



( i l+2a+2p + n )2 



, }_ ((i 1+2a+4p )/logi)n 2 i 1+2a ^ .i ogi 

n2 



(1+n) 2 n 2 ^ (z 1 + 2 «+2p + n) 2 

V ' 2<i<n 1 /(i+2«+2p) v ' 

2 

MO 1 2a 

< jz , ' >o +n i+ 2 «+ 2 f/t n (a). 
(1 + n) z 

Hence, since x \- > x/(c + x) is increasing for every c > 0, we have 

(i l+2 a+ 2 P + ra) 2 
MOfcn, «„S«S«n i < n l/(l+2a n +2p) V* t- ' J J 

< + n 1+2a n+2 P h n (a n ) = + Ln i+2«n+2p i og ^ n . 
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By Lemma 2.1, a n > f3 — co/logn for a constant Co > only depending on 
R. Hence, using again that x \— > x/(c + x) is increasing for every c > the 
right-hand side is bounded by a constant times n~ 2 ^^ 1+2 ^ +2p ^ log 2 n. 

To complete the proof we deal with the terms between 2 V n 1 /( 1 + 2a n+ 2 P) 
and n i/(l+2/3+2 P )_ ( For Sn > (l og n)/(21og2) - 1/2 - p the expression 
n i/(i+2a n +2p) - g no ^_ g rea t er than 2.) Let J = J(n) be the smallest inte- 
ger such that (a n A ((logn)/(21og2) - 1/2 - p))/(l + l/logn) J < /3. One 
can see that J is bounded above by a multiple of (log n) (log log n) for any 
positive (3. We partition the summation range under consideration into J 
pieces using the auxiliary numbers 

. 1 , anA((logn)/(21og2)-l/2-p) . 

= 1 + 2 7- , v - h 2p, j = 0, . . . , J. 

(1 + 1/lognp 

Note that the sequence 6j is decreasing. Now we have 

n i/(i+2^+2p) . 2 +4q+4 P „2 J-in'^'+l J-ln'^+l -i.,,,2 

C^l+2a+2p i n ^2 — 2^ ^°>* - 2_> fjbj+i T n \2 ' 

j=2Vn 1 /(l+2^n+2 P ) V ; J'=0 i=n l/6j i=0 i=n i/i> 3 V 7 

and the upper bound is uniform in a. Since (6j — bj+i) log n = 6j+i — 1 — 2p, 
it holds for n 1 /^ < i < n 1 / 6 ^! that i b j~ b i+i < n Vl°gn = e . On the same 
interval z 2p is bounded by n 2p ' b:i+1 . Therefore the right hand side of the 
preceding display is further bounded by a constant times 

J i=n '3 J i=n ' J 

2 P /b j+1 -l h ( <*n A ((log n)j (2 log 2) - l/2-p) \ 1/6 . +1 logn 



V (1 + 1/ log n)i+i ) n b 3+1 

J-l 

< (logn) Y J ^ 1+2p)/b3+1 ' 1 hn(b 3+ i/2 - 1/2 -p) 

3=0 

2/3/(1 + 1/ logn) LZ^ 

< (log n)n 1+2^/(1+1/ log n)+2 P \ h n (b j+1 /2 -l/2-p). 

j=0 



In the last step we used the fact that by construction, bj/2 — 1/2—p > /3/(l+ 
1/logn). It follows from the definition of a n and bj that h n {bj-\.\/2—l/2—p) 
is bounded above by L(logra) 2 for every j < J — 1, and we recall that 
J = J(n) is bounded above by a multiple of (logn) (log logn). Finally we 
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note that 

2/3/(1+1/ log n) 



n 



+2/3/(1+1/ log n)+2 P < en -2/3/(l+2/3+2p)^ 



Therefore the first sum in (6.3) over the range [2Vn 1/(1+2a " +2p) , n i/(i+2/3+2 P )] 
is bounded above by a multiple of n~ 2/3// ( 1+2 ^ +2p )(logn) 3 (loglogn), in the 
appropriate uniform sense over P n . 

6.2. Bound for the centered posterior risk. To complete the proof of The- 
orem 2.3 we show in this section that we also have 

oo oo 

sup E sup y^(Ao:,i - Vo,i) - E o^(Ao,i - /^o.i) = 0(e 2 ), 

\\vo\\/3<R a£[a n ,a n Alogn] i=1 i=l 

for e n = n _/3 / ( - 1+2 ' 3+2p )(logn) 3 / 2 (loglogn) 1//2 . Using the explicit expression 
for the posterior mean jl a ^ we see that the random variable in the supremum 
is the absolute value of Y(a)/n — 2W(a)/Tyn, where 

-Zi. 



Y(a) = y ^ (Zf-1), W(a) = Y — 

We deal with the two processes separately. 

For the process V, Corollary 2.2.5 in [29] implies that 

/■diam n 

E sup \Y(a)\ < sup 1/ var Y(a) + / \/N{e, [a n , oo), d n ) da, 

a S[a n ,oo) Q S[o n ,oo) JO 

where d 2 n (01,02) = varo(V(ai) — ¥(02)) and diam n is the d n -diameter of 
[a n ,oo). Now the variance of V(a) is equal to 



var Y(a) = 2ra 4 ^ 



i 4 P 



" (il+2«+2p + n )4 ' 

since varo Zf = 2. Using Lemma 8.2 (with m = 0, I = 4, r = 1 + 2a + 2p 
and s = 4p), we can conclude that the variance of Y(a) is bounded above by 
a multiple of n,( 1 + 4 P)/( 1 + 2Q! + 2 ? ) ). It follows that the diameter of the interval 
diam„ < n ( 1 + 4 P)/( 1 +2"„+2p) > -p CO mpute the covering number of the interval 
[a n ,o: n ] we first note that for < a\ < 02, 

~ / n 2^2p n 2^2p \2 

var (V(ai) - V(a 2 )) = 2^{ {i i + 2 ai+ 2 P + n) 2 ~ (i i+2» 2+ 2 P + n) 2 ) var ^ 



i=2 



< 2 y — u < 2 ^ 4 y *- 4 - 8ai - 4p < n 4 2- 8 ^ . 

— L 4 (il+2ai+2p 1 — / 4 ~ 

8=2 V ; i=2 
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Hence for e > 0, a single e-ball covers the whole interval [K log(n/e), oo) for 
some constant K !> 0. By Lemma 6.1, the distance d n (01,02) is bounded 
above by a multiple of \ct\ — a2| r ^ 1+4p ^ 2+4 "" +4p ' ) o S n )- Therefore the cov- 
ering number of the interval [a n , K log(n/e)] relative to the metric d n is 
bounded above by a multiple of (logn)n( 1+4p ) // ( 2+4 -™ +4p )(log(n/e))/e. Com- 
bining everything we see that 

l+4p 

E sup |V(a)| < n 2+4 ^+ip (logn). 

ae[Q n ,oo) 

By the fact that x 1— > x/(x + c) is increasing and Lemma 2.1. (i) , the right- 
hand side divided by n is bounded by 

n -i+2^+2 P ( logn ) < n -W(l+2/»+2p)( logn )_ 

It remains to deal with the process W. The basic line of reasoning is 
the same as followed above for V. An essential difference however is the 
derivation of a bound for the variance of W, of which we provide the details. 
The rest of the proof is left to the reader. The variance W(a)/y / n is given 
by 

00 -2+Aa+Qp 2 



var ° V J jri {i 1+2a+2 P + n) 4 



,2 



We show that uniformly for a 6 [a n ,a n ], this variance is bounded above by 
a constant (which depends only on H^ollfi) times n -( - 1+4 ' 3 )/( 1+2 ^ +2p - ) (log 
For the sum over i < n 1 ^ l+2a+2p " > we have 

(6.5) 

„,-2+4o!+6p..2 
(jl+2a+2p T n \4 

i<n 1 /( 1 + 2a + 2 P) 

< Mo,l 1 n 2 i 1+2a+6 P(logi)- 1 i 1+2cl /i^logi 



2<i<n 1 /(i+2 a +2 P ) v ' 

j|goll| 4 P /(l+2 a+ 2 P ) n 2 i 1 + 2 > 2 t logi 

- n 3 +( 1 + 2a + 2 W n ogn ) n 2 2^ (ji+20+ap + n )2 

V 6 ' j< n l/(l+2c+2p) ^ > 

II,, II 2 

< rr^+U l+2"+2p/ ln (a). 

We have used again the fact that on the range i < n 1 ^ 1+2a+2p \ the quantity 
i 1+2a+6p (log i)^ 1 is maximal for the largest i. Now the function x h-> —(1 + 
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2x)/(x + c) is decreasing on (0,oo) for any c > 1/2. Moreover h n (a) < 
L(logra) 2 for any a < a n , thus the preceding display is bounded above by a 
multiple of n _ ( 1+4 -™)/( 1+2 -™ +2p )(logra) 2 . Using Lemma 2.1. (i) this is further 
bounded by a constant times n _ ( 1+4 ^ )// ( 1+2/3+2p )(logra) 2 . 

Next we consider sum over the range i > n 1 /( 1+2a+2p ). We distinguish 
two cases according to the value of a. First suppose that 1 + 2a > 2p. Then 
i~*~ 2a+2p (log is decreasing in i, hence 



E 



r)1 -2+4a+6p.,2 



( i l+2a+2p + n )4 

i>n 1 /(i+2a+2 P ) v > 

^ 1 ^ n 2 r 1 - 2Q+2 P(logi)- 1 i 1+2a // 2 i logi 



Tl ^ ^l+2a+2p + n \2 

1 + 2a + 2p s-, ra 2 i 1+2a ^logz 

- n (2+4a)/(l+2a+2p) loen 2^ ^l+2a+2p + n )2 

j >n l/(l+2 a +2p) ^ I 

_ l + 4a 

<n 1 +2«+2p/i n (a). 

As above, this is further bounded by a constant times the desired rate 
n -(i+4/3)/(i+2/3+2p)( logn )2_ if i + 2a < 2p, then 

„„-2+4o+6p,.2 

y ™ °t <n y r 2 - 4a - 2 ^i 2 ^ 2 0i 

L 4 ljl+2a+2p i n \4 — / j ^U,i 

j>n 1 /(l + 2a + 2p) V I j >n l/(l + 2a + 2p) 

2p-2 ) 9 1 
< ||// ||^ 1+2 " +2p • 

Since a n > (3 — c$/ logn, we have 1 + 2a > 2/3 for large enough n, for any 
a G [a n ,a n ]. Since we have assumed 1 + 2a < 2p, this implies that 2p > 2(3. 
Therefore the right hand side of the preceding display attains its maximum 
at a = a n . Using again that a n > (3 — cq/ log n, it is straightforward to show 
that for a £ [a n ,a n ], 

2 v -2f) -, 2p-2/3 , 1+4/3 

J2l+2ci+2p < ^l+2a n +2p < g 4c 0^ l+2/3+2p _ 

This completes the proof. 

6.3. Bounds for the semimetrics associated to V and W. The following 
lemma is used in Section 6.2. 

Lemma 6.1. For any a n < a\ < ai the following inequalities hold: 
varo(V(ai)-V(a 2 )) < («i - a 2 ) V 1+4 P)/( 1+2 ^+ 2p )(logn) 2 , 
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/W(ai) W(a 2 )\ . , , 2 y^-„ n4 

var — ^ < (ai - a 2 fn (logn 4 , 

iwii/i a constant that does not depend on a and no- 

Proof. The left-hand side of the first inequality is equal to 

oo 

n 4 ^(/,(ai)-/ l (a 2 ))VPvarZ2, 
i=i 



where fi{a) = (j}+ 2ot + 2 P -\- n) -2 . The derivative of fa is given by f!(a) = 
— 4i 1+2a+2p (log i) / (i 1+2a+2p + n) 3 , hence the preceding display is bounded 
above by a multiple of 

\a.\ — a 2 ) n sup y. 



7 ^ , ^ (jl+2a+2p i „\6 

< (ai-a 2 )V(logn) 2 sup — - — — ^ 



4»i,« a ] (1 + 2« + 2p) 2 ^ (i 1+2a+2p + n)^ 
< (ai-a 2 ) 2 (logn) 2 sup n (i+W(i+2a+2 p)) 

a£[ai,a2] 

with the help of Lemma 8.4 (with r = 1 + 2a + 2p, and m = 2), and 
Lemma 8.2 (with m = 0, I = 4, r = 1 + 2a + 2p, and s = r + 4p). Since 
a > a n , we get the first assertion of the lemma. 

We next consider W/^/n. The left-hand side of the second inequality in 
the statement of the lemma is equal to 



^2(fi(ai) - /j(a 2 )) 2 n^o,iVar Z i: 



i=l 



where now fi(a) = i 1+2a + 3 P j (i 1 + 2a + 2 P_)_ n ) 2 . The derivative of this fa satisfies 
|/i( a )l — 2(logi)/i(a), hence we get the upper bound 



4(a 2 — ai) 2 sup 



oo^ ni 2+4a+6 P/i 2 , log 2 • 



,, , l+2a+2p i „U 



The proof is completed by arguing as in (6.5). ■ 

7. Proof of Theorem 2.5. Again we only provide details for the 
Sobolev case. Let A n be the event that a n G [a n ,a n ]. Then with a i— > 
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A n (a | Y) denoting the posterior Lebsegue density of a, we have 
(7.1) 

sup EoLTdl/x - /xoM > M n L n n-W( 1+2 ^\Y) 

WlM)\\p<R 



< sup P (A c n )+ sup E / \ n (a\Y)dal An 
\\noh<R \\w>h<R J o 

roc 

+ sup E / A n (a|y)n a (||/i-/xo|| > M n L n rr^'^+ 2 ^\Y)dal An . 



\\po\\p<R J a rl 

By Theorem 2.2 the first term on the right vanishes as n — > oo, provided I 
and L in the definitions of a n and a n are chosen small and large enough, 
respectively. We will show that the other terms tend to as well. 

Observe that X n (a\Y) oc L n (a)X(a), where L n (a) = exp(^ n (a)), for £ n 
the random function defined by (2.2). In Section 5.3 we have shown that on 
the interval (0, a n + 1/logn] 

l/(l+2a+2p) j 

n\ ) l + 2a + 2p ' 

on the event A n . Therefore on the interval (0,a n ] we have 

Kn l/(l+2a n +2p) 

\n) l + 2a n + 2p 
for some K > and on the interval [a n + 1/(2 log n),a n + 1/ log n] , 



£ n (a) < £ n {a n ) < £ n (a n + — — 
\ 2 log 



n 



For the likelihood L n we have the corresponding bounds 
L n (a) < exp — — — — )L n [a n + 



l + 2a n + 2p J "V- n 21ogn 
for a € (0, a n ] and 

L n (a) >L n [a n + 



21ogn/ 

for a E [a n + 1/(2 log n), a n + 1/ log n] on the event A n . Using these estimates 
for L n we obtain the following upper bound for the second term on the right- 
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hand side of (7.1): 

(7-2) 

/ Q -" \(a)L n (a) da 

SUP ^0 POO w x T , s , 

I|moI|/3<b Jo A(a)L n (a)da 

^ n i/(i+2a n +2 P ) L„ f a n + Jjf" A(a) da 

< sup Eq exp ' s 



»«>ll^ V 1 + 2a n + 2p ; Ln ^ + J^/X) da 

< sup exp I — )( / A(a) da 



Ho\\/3<R 1 + 2a n + 2p J \Ja n +1/(2 logn) 

From Lemma 2.1 we know that a n > /3/2 for large enough n, hence by 
Assumption 2.4, Lemma 7.1, and the definition of a n , 

A(a)da > C\ ( 2 log n ) ~ ° 2 exp ( - C 3 exp ( ^log n/3) ) 

Q n +l/(21ogn) 

for some Cx^CiiC^ > 0. Therefore the right hand side of (7.2) is bounded 
above by a constant times 

eXP (" l + 2 v ^ + 2j (l0gn) 2ex P( C 3exp(^-jj. 

It is easy to see that this quantity tends to as n — > oo. 

In bounding the third term on the right hand side of (7.1) we may re- 
place the supremum over ||//o||/8 — R by the supremum over the set P n 
defined in Section 6.1, since otherwise a n = oo. For /j,q G Pn we have 
<S n < logn/(21og2) — 1/2 — p (Lemma 2.1). We then write the third term 
as 



(7.3) MoeP " 



sup E ( / A n (a|r)n a (|| M - Mo || > M n L n n- 2 ^ 1+20+2p ^\Y)da 

poo . 

+ A n (a|F)n a (|| M -/io||>M nJ L n n-WW+2p)| F)da h A 

•/ On 



The first term in (7.3) is bounded above by 

sup E sup_ n Q (|| M - Mo || >M n L n n-^^ l+2 ^\Y). 

MoSPn a£[a n ,a n ] 

This goes to zero, as we have shown in the proof of Theorem 2.3. In Sec- 
tion 5.1 we have shown that the differentiated log- likelihood function M n on 
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the interval [a n , oo) can increase maximally by a multiple of 

n l/(l+2a„+2p) (logn) 2 

1 + 2a n + 2p 

Moreover, in Section 5.2 we have shown that for a G [5 n — l/logn,a n ], 

n i/(i+2a n +2 P )^ ogn ^3 



i' n {a) = M n (a) < —M- 



1 + 2a n + 2p 



on the event A n , and M can be made arbitrarily large by increasing the 
constant L in the definition of a n . Therefore the integral of M n (a) on \a n — 
1/logn, a n — 1/(2 log n)] is bounded above by 

Mn l/(l+2a„+2p)( logn )2 



2 1 + 2a n + 2p 

and by choosing a large enough constant L in the definition of a n it holds 
that for some N > 0, 

u I 21oen/ l + 2a ri +2n 



for a G [a n , oo), and 



(a) > *«. a 



21ogn 



for a G [a n — 1/logn, a n — 1/(2 log n)]. These bounds lead to the following 
bounds for the likelihood: 

< ink - exp (-JV — — 2* 

\ 2 log nJ V 1 + 2a n + 2p 

for a G [a n , oo), and 

L n (a) > L n ( a„ 



21ogn 



for a G [a n — 1/ log n, a n — 1/(2 log n)] . Similarly to the upper bound for the 
second term of (7.1) we now write 



poo 

sup Eo / X n (a\Y) da < sup Eo 

UOEPn Jan MOePn 



IZ K a ) L n{a) da 



wePn ■/<*„ MoeP„ Jo \{a)L n (a)da 

i i/(i+2 5n +2 P ) (lQg n) 2 J« \(a) da 



_ f oo 

< sup exp( — N 



wek V l + 2a n + 2p J r« 2 lo s X (a) da 
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Since a n > a n > (3/2 for n large enough, Assumption 2.4 and Lemma 7.1 
imply that 

f-°° X(a) da „ , „ . 

_ < C 4 (logn) C5 exp C 6 c£ 7 . 

Ja n — 1/logn ' 

Since a n < logn/(21og2) — 1/2 — p, the right-hand side of the preceding 
display is bounded above by 

C 4 exp(-2C 9 (log2)(logn))(logn)^e X p(c 6 (i^ - \ ~ p)^ 

which tends to zero for any fixed constant C-j smaller than 1. 

Lemma 7.1. Suppose that for c\, 02,03 > and C4 > 1, the prior density 
A satisfies 

c~l l a~ c ' i < X(a) < c^a~ C2 or cj 1 exp(— 03a) < \(a) < C4exp(— c%a), 

for a > c\. Then there exist positive constants C\, . . . ,Cq and C7 < 1 de- 
pending on c\ only such that for all x > c\, every S n —> 0, and n large 
enough 

rx+25 n 

/ A(a) da > C\5^ 2 exp ( — C3 exp ( — 
Jx+s n " v v 3 

and 

£°A(a)cfa <cs -0, (c Cry 

Jx-2S n A («) d(X 

Proof. The proof only involves straightforward calculus. ■ 

8. Auxiliary lemmas. In this section we collect several lemmas that 
we use throughout the proofs to upper and lower bound certain sums. 

Lemma 8.1. Let c > and r > 1 + c. 
(i) For n > 1 

n log i / 2 2 \ n l / r log n 

1! -A- n \ r 



i=l 



i r + n V c c 2 log 2 



(ii) If r > (log n) /(log 2) ; then for n > 1 

n log i / 2 2 \ ,, 

< 1 + - + ^— log2 n2" 
z r + n V c cr log 2 / 



i=l 



BAYESIAN ADAPTATION IN INVERSE PROBLEMS 35 

Proof. First consider r < (log n)/ (log 2), which implies that n x l r > 2. 
We split the series in two parts, and bound the denominator i r + n by n or 
i r . Since log i is increasing, we see that 

L" 1/r J 1/r , 

En I logn 
log*< . 

i=l 

Since f(x) = x _7 logx is decreasing for x > e 1 / 7 , we see that i~ r \ogi is 
decreasing on interval [[n 1 / 7 *],©©) for n > e. Therefore 



Enlogi logrn 1/r l f°° 
— : < n r - , h n / 

i=\n L l r \ 

Since \x\ /x < 2 for x > 1, and n 1//r > 2, 

log[n 1/r ] 1/r n^Mogn 
n , < 21ogn / < . 

| n l/r|r r 

Moreover 

logx f°° logx x _ 1/r (r - 1) logn 1 / 7 " + 1 



,\ogi logrn 1 / r l f°° logx , 

1 ax. 



/*°° logx p logx 1 
/ ax < ax = n 

7r n l/r-l X r y n l/r X r 



x r i n i/r x r ' (r-1) 2 

Since r > 1 + c, we have 

logn 1 / 7 ' 1 logn 1 logn 1 /'* 1 logn 

r — 1 — c r ' (r — l) 2 — (r — l) 2 log 2 — c 2 log 2 r 

This proves (i) for the case r < (logn)/ (log 2). 

We now consider r > (logn) /(log 2), which implies that n l / r < 2. We 
have 

En log i ^ log i n _ r f m , 
< n > < n2 log 2 + n / x log x ax, 

by monotonicity of the function / defined above (with 7 = r). We have 

o_ r (r-l)log2 + l 



00 

x~ r logxax = 2 1 



(r-1) 2 

and since r > 1 + c 

log 2 < fog_2 1 < 1_ 

r — 1 ~ c (r — l) 2 ~~ c 2 
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which finishes the proof of (ii). 

To complete the proof of (i), we consider the function f(x) = 2~ x x and 
note that it is decreasing for x > l/log2. Therefore n2~ r = (n2~ r r)/r < 
(log n)/(r log 2) , for n > 3. Since 1 < n l l r , we get the desired result. ■ 

Lemma 8.2. For any m > 0, I > \, r > 0, r € (0,r ], s £ (0,rl - 2], 
and n > e 2mr ° 

f> ^(lQg^) m < An (l + s-lr)/r(^nr 
^ (i r + n) 1 ~ r m 

The same upper bound holds for m = 0, r £ (0, oo), and n > 1. 

Proof. We deal with this sum by splitting the sum in the parts i < n 1//r 
and i > n x l T . In the first range we bound the sum by 

n l/r 

J>-^(logzr < n 1 /^^^ ^, 



by monotonicity of the function f(x) = x s (logx) m . 

Suppose that m > 0. The derivative of the function f(x) = x~ 1 / 2 (logx) m 
is f'(x) = x~^l 2 (\ogx) rn ~ l (rn — (logx)/2), hence it is monotone decreasing 
for x > e 2m . Since n L l T > n 1//r ° and n > e 2mr ° , the function / is decreasing 
on interval [n 1 / 7 *, oo). Therefore we bound the sum over the second range by 



oo 



i s ~ rl (logi) m ^n- 1 /^)^^- i 1/2+s - rl - 



(logra) 

i=n 1 / r i=n 1 / r 

Since s < rl — 2, iV 2 + s -^ i s decreasing and rl — s — 3/2 > 1/2. We get 



V iV2+-ri < „(l/2+ a -ri)/r + / ^ 

1 

-3/2 -s + W 



(l/2+s-W)/r _^ 1 w (3/2+s-ri)/r 



< 3n (3/2+s ~ w)/r . 
In the case m = 0, we use monotonicity of z s ~ ri for all i > 1. ■ 

Lemma 8.3. For any p > 0, r G (1, (logra)/(21og(3e/2))], and 7 > 0, 



n^logi > 1 

^ (z r + n)T ~ 3-2Tr n ° gn ' 
i=i 
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Proof. In the range i < n L l r we have i r + n < 2n, thus 

1=1 v ; 1=1 1/1 1/1 

since n 1//r > 2 and [^J > 2x/3 for x > 2. The latter integral equals 
(2/3)n 1 / r (log((2/3)n 1 / r ) - l) + 1. Since logn > 21og(3e/2)r implies that 
(logn)/(2r) > log(3e/2), we have 



^n 1/r (log(Jn 1/r ) - l) = ^n 1/r (J- log n- logy) > ^n 1/r logn. 



Lemma 8.4. Let m, i, r, and £ 6e positive reals. Then for n > e m 

ni r (r log i) m , _ n^fr log i)^ m , 

— ^ ^f - < logn m , — ^ ^— < logn f m . 

if + n) 2 ~ 1 6 ; ' (« r + n)« ~ 1 6 ; 

PROOF. Assume first that i < n 1 / 7 ", then the left hand side of the first 
inequality is bounded above by 

n 2 (rlogn^ r ) m 

— ^ '— = (logn m . 

n z 

Next assume that i > n 1 ^ . The derivative of the function f(x) = 
x~ c (logx) m is f'(x) = x _c_1 (log x) m_1 ( — c(logx) + m), hence f(x) is mono- 
tone decreasing for x > e m ^ c . Therefore the function i _r (logi) m is monotone 
decreasing for i > e m / r and since by assumption i > n 1 / 7 ", we get that for 
n > e m the function f(i) = i~ r (logi) m takes its maximum at i = n l l r . 
Hence the left hand side of the inequality is bounded above by 

n(rlogi) m i _r < nr m (log n 1//r ) m n~ 1 = (logn) m . 

The second inequality can be proven similarly. ■ 
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